Shareware Overload Trio 2

home *** CD-ROM | disk | FTP | other *** search

/ Shareware Overload Trio 2 / Shareware Overload Trio Volume 2 (Chestnut CD-ROM).ISO / dir28 / st-size.zip / PC-SIZE.DOC next >

Wrap

Text File | 1992-07-06 | 34KB | 1,004 lines

PC-SIZE A Program for Sample Size Determinations Version 2.13 (c) 1985, 1986 "One of many STATOOLS(tm)..." by Gerard E. Dallal 54 High Plain Road Andover, MA 01810 PC-SIZE determines the sample size requirements for single factor experiments, two factor experiments, randomized blocks designs, and paired t-tests. In generic F mode, PC-SIZE can determine sample sizes for any experiment in which the power at the alternative is given by a non-central F distribution with fixed numerator degrees of freedom, denominator degrees of freedom that are linear in the sample size, and a non- centrality parameter that is proportional to the sample size. PC-SIZE can determine the sample size needed to detect a non- zero population correlation coefficient when sampling from a bivariate normal distribution. It can also be used to obtain the common sample size required to test the equality of two proportions. PC-SIZE can calculate the power of specific sample sizes as well as determine the sample size needed to achieve specific power. NOTICE Copyright 1985 and 1986 by Gerard E. Dallal. The pair of PC-SIZE programs is shareware. Please see the notice in the documentation for PC-SIZE: Consultant. Please acknowledge PC-SIZE in any manuscript that uses its calculations. PAGE 2 DISCLAIMER STATOOLS are provided "as is" without warranty of any kind. The entire risk as to the quality, performance, and fitness for intended purpose is with you. You assume responsibility for the selection of the program and for the use of results obtained from that program. TABLE OF CONTENTS Features.................................................. 2 Installation.............................................. 3 Operation................................................. 4 Specifying the design................................. 4 Specifying the alternative............................ 4 Generic F mode........................................ 5 Initial approximation................................. 5 Correlation coefficient................................... 5 Proportions............................................... 6 Paired t-test............................................. 7 Other applications........................................ 8 Two sample t-test..................................... 8 Two period cross-over design.......................... 8 Comparing a single sample to a known standard......... 8 Power of specific sample sizes............................ 8 Non-centrality parameters................................. 8 Validation................................................ 11 Algorithms................................................ 16 References................................................ 16 Sample size tables for the correlation coefficient........ 18 FEATURES 1. Flexibility: Query system for single factor, two factor, randomized blocks designs and paired t-tests. Generic Mode permits sample size calculations for many PC-SIZE G.E. Dallal PAGE 3 problems in which the power at the alternative is given by the non-central F distribution. 2. Portability: PC-SIZE is written in FORTRAN 77, but not too far from the 66 standard. To make PC-SIZE run on a VAX, for example, all you need do is modify the I/O unit numbers (contained in a single DATA statement) and an OPEN statement. 3. PC-SIZE will calculate the power of a specific sample size as well as the sample size required to achieve specific power. 4. Calculations may be saved in a designated output file. 5. Double precision calculations are used throughout. 6. Quantities contained in square brackets at the prompts are default values which can be obtained by pressing the return key. Default values are updated with the latest entry for each quantity, thereby simplifying the task of requesting a number of sample size calculations that share many of the same specifications. 7. Trailing decimal points may be omitted or included as you wish. INSTALLATION PC-SIZE is written for the IBM-PC. Installation on a new computer may entail modifying the following statements: The first DATA statement: IIN -- input unit number (screen) IOUT -- output unit number (screen) IWOUT -- save file unit number NMAX0 -- large integer constant (the largest sample size that can be considered) The OPEN statement for the save file just before statement 10. PC-SIZE G.E. Dallal PAGE 4 OPERATION Operation begins with the user specifying the level of the test and the power required at the alternative. PC-SIZE will report the number of observations per cell, per group (in the case of proportions), or per randomized block. Specifying the Design Single factor designs: The user is prompted for the number of groups. Two factor designs: The user is prompted for the number of levels of each factor. (Estimates are based on the main effects of factor A. Use generic F mode to base estimates on the interaction structure.) The user can then indicate whether an interaction term will be present in the model and the ANOVA table. (A * B * (N - 1) denominator degrees of freedom, where 'A' and 'B' are the number of levels of the two factors, if interaction is present; A*B*N - A - B + 1 denominator degrees of freedom, if not.) Randomized blocks designs: The user is prompted for the number of levels of the treatment factor. PC-SIZE calculates the number of blocks needed to achieve the desired power assuming each block receives one complete set of treatments. Paired t-tests: The user is prompted for the expected difference and the standard deviation of the differences. Specifying the Alternative In the cases of single factor, two factor, and randomized blocks designs, the user is given three options for specifying the alternative at which the power is to be evaluated: 1. Specifying the individual effects. PC-SIZE automatically centers the effects about zero. It is not necessary to subtract the mean from each effect before entry. 2. Specifying a range (a single number) for the effects. The minimum and maximum effects are assumed to occupy the PC-SIZE G.E. Dallal PAGE 5 endpoints of the range with the remaining effects distributed uniformly throughout. 3. Specifying the average squared effect (where, for this option, the mean has been subtracted from each effect before squaring) divided by the error variance. Generic F Mode Generic mode requires more sophistication on the part of the user but is capable of handling a wide variety of problems, specifically, any problem for which the power at the alternative is given by a non-central F distribution with fixed numerator degrees of freedom, denominator degrees of freedom that are linear in the sample size, and a non- centrality parameter that is a multiple of the sample size. (Non-centrality parameters are discussed below.) The user is prompted for the numerator degrees of freedom, the linear function that defines the denominator degrees of freedom, and the multiple of the sample size that defines the non- centrality parameter. Initial Approximation PC-SIZE invokes a "large sample approximation" (using a non- central chi-square power function in place of the non-central F) to get a rough estimate the necessary sample size. The power is calculated at increments of 1 if the estimate is less than 500, 10 if the estimate is between 500 and 5000, 100 if the estimated is between 5000 and 50000, and so on. The calculations start at the large sample estimate less 5% or a count of 10, whichever is greater, rounded to the nearest increment, and continue until the required power is obtained. The correlation coefficient and proportions are handled differently--see below. CORRELATION COEFFICIENT This mode is used when sampling from a bivariate normal population, neither of the two variables having its values fixed prior to sampling. PC-SIZE will calculate the sample size needed to carry out a two-tailed test of the hypothesis PC-SIZE G.E. Dallal PAGE 6 that the population correlation coefficient is 0. The user is prompted for a non-null value of the coefficient. Note: The distribution of the sample correlation coefficient when the population value is non-zero is obtained through numerical integration using Simpson's Rule with some bells and whistles to speed up convergence. Ordinates of the density function are calculated recursively, resulting in an execution time that is proportional to sample size. PC-SIZE reports the power of the test for sample sizes 3, (2**K: K=2,3,...) successively until the required power is exceeded. A binary search is them carried out (with intermediate results NOT reported) to locate the minimum adequate sample size. If the sample size is large, the binary search can consume large amounts of execution time. The Tables at the end of this document, produced by PC-SIZE, give the necessary sample size for tests of power 0.50(0.10)0.90, 0.95 at levels 0.05 and 0.01 for underlying population correlation coefficients of 0.05, 0.10(0.10)0.90. PROPORTIONS PC-SIZE uses formulas 3.18 and 3.19 of Fleiss(1981) to determine the common sample size for a test of the equality of two proportions. This estimate is a large sample approximation based on standard normal theory. The user is prompted for the values of the proportions under the alternative to equality. Equal sample sizes: In some instances the values produced by PC-SIZE will be 1 greater than those in Fleiss's Table A.3. Fleiss has apparently taken the values produced by the formulae and rounded to the nearest integer. PC-SIZE reports the smallest integer not less than the the results of the formulae. Unequal sample sizes: The user specifies the ratio of sample 2 to sample 1. Calculations are driven by sample 1. The estimate for sample size 2 is obtained by multiplying sample 1's size by the specified ratio and reporting the smallest integer no less than this value. This procedure can lead to situations where (1) the estimated sample sizes are not precisely in the proportions specified and (2) where PC-SIZE G.E. Dallal PAGE 7 switching the samples' labels and inverting the ratio will produce slightly different estimates. For example, (cf. Fleiss,1981,p.45): size of test 0.05, power at alternative 0.95: P1 P2 RATIO GROUP1 GROUP2 0.25 0.40 0.50 531 266 0.40 0.25 2.00 266 532 Use the smallest sample size consistent with the specified ratio that contains the estimates produced by PC-SIZE. PAIRED T-TEST PC-SIZE asks for the expected difference and the standard deviation of the differences. Often, a researcher will have some idea of the variances of the individual responses but not of variance of the difference. In that case, estimate the correlation of the responses and use the relation var(X - Y) = var(X) + var(Y) - 2 * corr(X,Y) * SQRT(var(X)*var(Y)) . If the variances of the two responses are equal, the relation reduces to var(X - Y) = var(X) * 2 * (1 - corr(X,Y)) . PC-SIZE G.E. Dallal PAGE 8 OTHER APPLICATIONS Two Sample t-test This is a single factor analysis of variance with two groups. Two period cross-over design The two period cross-over design can be treated as a paired t-test with one fewer error degrees of freedom than for the paired t-test based on the same total number of observations. Proceed as for a paired t-test, obtaining a sample size of 'n'. For each sequence (AB, BA), take (n+1)/2 observations if 'n' is odd, 1+n/2 if n is even. Comparing a Single Sample to a Known Standard Use the paired t-test mode setting the "expected difference" to the expected difference between the unknown population mean and the known standard. Set the "estimate of standard deviation of difference" to the estimated population standard deviation. POWER OF SPECIFIC SAMPLE SIZES PC-SIZE will perform power calculations for specific sample sizes as well as determine the sample size required to achieve specific power. If the requested power is an integer greater than or equal to 1, PC-SIZE starts its power calculations at a sample size equal to the requested power. The user is prompted for an increment and a stopping value. NON-CENTRALITY PARAMETERS Different authors use different definitions of the non- centrality parameter of the non-central F distribution. The differences typically involve a square root, a factor of (numerator degrees of freedom + 1), and/or a factor of 2. PC-SIZE G.E. Dallal PAGE 9 PC-SIZE follows the notation of Kendall and Stuart(1973, pp.237,262): The sum of the squares of "d" independent normal variables with arbitrary means and unit variances is said to follow a non-central chi-square distribution with "d" degrees of freedom and non-centrality parameter equal to the sum of the squared means. The ratio of a non-central chi- square variable with "d1" degrees of freedom and non- centrality parameter "lambda", divided by "d1", to an independent central chi-square variable with "d2" degrees of freedom, divided by "d2", is said to follow a non-central F distribution with "d1" numerator degrees of freedom, "d2" denominator degrees of freedom, and non-centrality parameter "lambda". Scheffe(1959,p.414) defines his non-centrality parameter to be the square root of this quantity. Following Graybill(1961, Theorem 11.16), a non-centrality parameter can be obtained as the numerator degrees of freedom times (the difference between the numerator expected mean square and the error variance) divided by the error variance. It is assumed that the error variance is given by the expected mean square of the denominator of the F-ratio. The following notation is used throughout this section: ALPHA -- level of the test POWER -- power at the alternative K -- number of effects under test (number of groups, levels,...) F1 -- numerator degrees of freedom F2 -- denominator degrees of freedom AVGESQ -- average squared effect divided by the error variance LAMBDA -- non-centrality parameter N -- sample size EVAR -- error variance (often within cell) EFF(I) -- the I-th of the effects under test [ AVGESQ = (SUM(EFF(I)**2) / K) / EVAR ] 1. Single Factor Experiment (K Groups): LAMBDA = N * SUM(EFF(I)**2) / EVAR = N * K * AVGESQ PC-SIZE G.E. Dallal PAGE 10 2. Two Factor Experiment (Factor A -- "A" levels; Factor B -- "B" levels): Main effects for Factor A: LAMBDA = N * B * SUM(EFF(I)**2) / EVAR = N * A * B * AVGESQ Two factor interaction: LAMBDA = N * SUM(EFF(I)**2) / EVAR = N * A * B * AVGESQ 3. Randomized blocks designs (Single treatment factor at K levels): LAMBDA = N * SUM(EFF(I)**2) / EVAR = N * K * AVGESQ 4. Simple linear regression: E(Y(i)) = C0 + C1 * X(i) (N observations at each X(i), i=1,...,p, with mean 0) LAMBDA = N * (C1**2 * SUM(X(I)**2)) / EVAR 5. Quadratic regression: E(Y(i)) = C0 + C1 * X(i) + C2 * X(i)**2 H0: C1 = C2 = 0: LAMBDA= N * (C1**2 * SUM(X(i)**2)+ 2 * C1 * C2 * SUM(X(i)**3 + C2**2 * SUM(X(i)**4) / EVAR H0: C2 = 0 LAMBDA = C2**2 * SUM(X(i)**4) PC-SIZE G.E. Dallal PAGE 11 VALIDATION PC-SIZE was validated by applying it to all of the examples from sections 3.2 through and including 3.6 of Odeh and Fox (1975) which were reproduced with the following exceptions: example 3.3.1 (main effects for A with no interaction in the model): OF estimate 3. PC-SIZE calculates the power of a sample of size 3 to be 0.79896 (<0.80). 4 are needed. example 3.5.2 (test of quadratic regression term): OF estimate 40. PC-SIZE calculates the power of a sample of size 40 to be 0.94796 (<0.95). 41 are needed. example 3.6.2 (multivariate t-test): OF estimate 100. PC- SIZE calculates the power of a sample of size 100 to be 0.99484 (<0.995). 101 are needed. The values of the arguments and the resulting sample size estimates from PC-SIZE are: Single Factor Experiment (K Groups) LAMBDA = N * SUM(EFF(I)**2) / EVAR = N * K * AVGESQ Example 3.2.1: ALPHA = 0.05 POWER = 0.80 K = 2 F1 = 1 F2 = 2 * (N - 1) AVGESQ = 2 LAMBDA = 4 * N N = 4 Example 3.2.2: ALPHA = 0.025 POWER = 0.70 K = 3 F1 = 2 F2 = 3 * (N - 1) AVGESQ = 1/3 LAMBDA = 1 * N N = 11 Example 3.2.3: ALPHA = 0.01 POWER = 0.975 K = 6 F1 = 5 F2 = 6 * (N - 1) AVGESQ = 2/3 LAMBDA = 4 * N N = 9 PC-SIZE G.E. Dallal PAGE 12 Two Factor Experiment (Factor A -- "A" levels; Factor B -- "B" levels) Main effects for Factor A: LAMBDA = N * B * SUM(EFF(I)**2) / EVAR = N * A * B * AVGESQ A * B interaction: LAMBDA = N * SUM(EFF(I)**2) / EVAR = N * A * B * AVGESQ where EFF(i),i=1,...,A*B are the interaction terms. Example 3.3.1: Main effects for A with interaction in model: ALPHA = 0.05 POWER = 0.80 A = 3 F1 = 2 F2 = 6 * (N - 1) B = 2 AVGESQ = 2/3 LAMBDA = 4 * N N = 4 Main effects for A with no interaction in model: ALPHA = 0.05 POWER = 0.80 A = 3 F1 = 2 F2 = 6 * N - 4 B = 2 AVGESQ = 2/3 LAMBDA = 4 * N N = 4 Test for interaction (Use generic mode): ALPHA = 0.05 POWER = 0.90 K = 6 F1 = 2 F2 = 6 * (N - 1) AVGESQ = 1/2 LAMBDA = 3 * N N = 5 Example 3.3.2: Main effects for A with interaction in model: ALPHA = 0.005 POWER = 0.60 A = 4 F1 = 3 F2 = 16 * (N - 1) B = 4 AVGESQ = 1 LAMBDA = 16 * N N = 2 PC-SIZE G.E. Dallal PAGE 13 Main effects for A with no interaction in model: ALPHA = 0.005 POWER = 0.60 A = 4 F1 = 3 F2 = 16 * N - 7 B = 4 AVGESQ = 1 LAMBDA = 16 * N N = 2 Test for interaction (Use generic mode): ALPHA = 0.10 POWER = 0.60 K = 16 F1 = 9 F2 = 16 * (N - 1) AVGESQ = 1/8 LAMBDA = 2 * N N = 5 Example 3.3.3: Main effects for A with interaction in model: ALPHA = 0.01 POWER = 0.70 A = 2 F1 = 1 F2 = 6 * (N - 1) B = 3 AVGESQ = 1 LAMBDA = 6 * N N = 3 Main effects for A with no interaction in model: ALPHA = 0.01 POWER = 0.70 A = 2 F1 = 1 F2 = 6 * N - 4 B = 3 AVGESQ = 1 LAMBDA = 6 * N N = 3 Test for interaction (Use generic mode): ALPHA = 0.001 POWER = 0.90 K = 6 F1 = 2 F2 = 6 * (N - 1) AVGESQ = 1/2 LAMBDA = 3 * N N = 10 Randomized blocks designs (Single treatment factor at K levels) LAMBDA = N * SUM(EFF(I)**2) / EVAR LAMBDA = N * K * AVGESQ PC-SIZE G.E. Dallal PAGE 14 Example 3.4.1(i): ALPHA = 0.05 POWER = 0.90 K = 3 F1 = 2 F2 = 2 * (N - 1) AVGESQ = 2/3 LAMBDA = 2 * N N = 8 Example 3.4.1(ii): multiple treatment factors use generic mode ALPHA = 0.05 POWER = 0.90 A = B = 3 F1 = 2 F2 = 8 * (N - 1) AVGESQ = 2/3 LAMBDA = 6 * N N = 3 Example 3.4.2: multiple treatment factors use generic mode ALPHA = 0.001 POWER = 0.95 A = B = 2 F1 = 1 F2 = 12 * N - 2 K = 1,...,6*N AVGESQ = 1 LAMBDA = 24 * N N = 2 Example 3.4.3: multiple treatment factors use generic mode ALPHA = 0.025 POWER = 0.70 A = 6 F1 = 5 F2 = 17 * (N - 1) B = 3 AVGESQ = 1/3 LAMBDA = 6 * N N = 3 Regression using Generic Mode Simple linear regression E(Y(i)) = C0 + C1 * X(i) (N observations at each X(i), i=1,...,p, with mean 0) LAMBDA = N * (C1**2 * SUM(X(I)**2)) / EVAR Quadratic regression E(Y(i)) = C0 + C1 * X(i) + C2 * X(i)**2 LAMBDA= N * (C1**2 * SUM(X(i)**2)+ 2 * C1 * C2* SUM(X(i)**3 + C2**2 * SUM(X(i)**4) / EVAR PC-SIZE G.E. Dallal PAGE 15 Example 3.5.1 (linear): ALPHA = 0.001 POWER = 0.995 F1 = 1 F2 = 3 * N - 2 LAMBDA = 17 * N N = 5 Example 3.5.1 (quadratic): H0: C1 = C2 = 0 ALPHA = 0.001 POWER = 0.995 F1 = 2 F2 = 3 * (N - 1) LAMBDA = 144 * N N = 3 Example 3.5.1 (quadratic): H0: C2 = 0 ALPHA = 0.001 POWER = 0.995 F1 = 1 F2 = 3 * (N - 1) LAMBDA = 257 * N N = 3 Example 3.5.2 (linear): ALPHA = 0.025 POWER = 0.95 F1 = 1 F2 = 6 * N - 2 LAMBDA = 1.150 * N N = 14 Example 3.5.2 (quadratic): H0: C2 = 0 ALPHA = 0.025 POWER = 0.95 F1 = 1 F2 = 3 * (N - 1) LAMBDA = .382 * N N = 41 Multivariate t-test Example 3.6.1 : ALPHA = 0.10 POWER = 0.70 F1 = 5 F2 = N - 5 LAMBDA = 1 * N N = 14 Example 3.6.2: ALPHA = 0.10 POWER = 0.995 F1 = 4 F2 = 2 * N - 5 LAMBDA = .25 * N N = 101 PC-SIZE G.E. Dallal PAGE 16 ALGORITHMS PC-SIZE makes use of the following published routines, modified to run in double precision: Best, D.J. and D.E. Roberts (1975). Algorithm AS 91. The percentage points of the chi-squared distribution. Appl. Statist.,24,385-388. Bhattacharjee, G.P. (1970). The incomplete gamma integral. Appl. Statist.,19,285-287. Cran, G.W., K.J. Martin and G.E. Thomas (1977). Remark AS R19 and Algorithm AS 109. A remark on algorithms AS 63: The incomplete beta integral, and AS 64: Inverse of the incomplete beta function ratio. Appl. Statist.,26,111-114. Hill, I.D. (1973). Algorithm AS 66. The normal integral. Appl. Statist.,22,424-427. Majumder, K.L. and G.P. Bhattacharjee (1973). Algorithm AS 63. The incomplete beta integral. Appl. Statist.,22,409-411. Odeh, R.E. and J.O. Evans (1974). Algorithm AS 70. The percentage points of the normal distribution. Appl. Statist.,23,96-97. and the author's FORTRAN translation of Pike, M.C. and I.D. Hill (1966). Algorithm 291. Logarithm of the gamma function. Commun. Ass. Comput. Mach.,9,684. REFERENCES Fleiss, Joseph L. (1981). Statistical Methods for Rates and Proportions, 2-nd ed. New York: John Wiley & Sons, Inc. Graybill, Franklin A. (1961). An Introduction to Linear Models, Vol, 1. New York: McGraw-Hill Book Company, Inc. PC-SIZE G.E. Dallal PAGE 17 Kendall, Maurice G. and Alan Stuart (1973). The Advanced Theory of Statistics, Volume 2, 3-rd ed. New York: Hafner Publishing Co. Odeh, Robert E. and Martin Fox (1975). Sample Size Choice: Charts for Experiments with Linear Models. New York: Marcel Dekker, Inc. Scheffe, Henry (1959). The Analysis of Variance. New York: John Wiley and Sons, Inc. PC-SIZE G.E. Dallal PAGE 18 SAMPLE SIZE FOR THE TEST OF A NON-ZERO CORRELATION COEFFICIENT ALPHA = 0.05 POWER 0.50 0.60 0.70 0.80 0.90 0.95 RHO: 0.05 1536 1959 2467 3137 4198 5192 0.10 384 489 616 782 1046 1293 0.20 96 122 153 193 258 319 0.30 43 54 67 84 112 138 0.40 24 30 37 46 61 75 0.50 15 19 23 29 37 46 0.60 11 13 15 19 24 30 0.70 8 9 11 13 17 20 0.80 6 7 8 9 11 13 0.90 5 5 6 6 8 9 ALPHA = 0.01 POWER 0.50 0.60 0.70 0.80 0.90 0.95 RHO: 0.05 2653 3199 3841 4667 5944 7116 0.10 662 798 958 1163 1481 1772 0.20 165 198 237 287 365 436 0.30 72 87 103 125 158 189 0.40 40 48 57 68 86 102 0.50 25 30 35 42 52 62 0.60 17 20 23 27 34 40 0.70 12 14 16 19 23 27 0.80 9 10 11 13 15 18 0.90 6 7 8 9 10 11 PC-SIZE G.E. Dallal